Maintaining Sentiment Polarity in Translation of User-Generated Content
نویسندگان
چکیده
The advent of social media has shaken the very foundations of how we share information, with Twitter, Facebook, and Linkedin among many well-known social networking platforms that facilitate information generation and distribution. However, the maximum 140-character restriction in Twitter encourages users to (sometimes deliberately) write somewhat informally in most cases. As a result, machine translation (MT) of user-generated content (UGC) becomes much more difficult for such noisy texts. In addition to translation quality being affected, this phenomenon may also negatively impact sentiment preservation in the translation process. That is, a sentence with positive sentiment in the source language may be translated into a sentence with negative or neutral sentiment in the target language. In this paper, we analyse both sentiment preservation andMT quality per se in the context of UGC, focusing especially on whether sentiment classification helps improve sentiment preservation inMTofUGC.Webuild four different experimental setups for tweet translation (i) using a single MT model trained on the whole Twitter parallel corpus, (ii) using multiple MT models based on sentiment classification, (iii) using MT models including additional out-of-domain data, and (iv) adding MT models based on the phrase-table fill-up method to accompany the sentiment translation models with an aim of improving MT quality and at the same time maintaining sentiment polarity preservation. Our empirical evaluation shows that despite a slight deterioration in MT quality, our system significantly outperforms the Baseline MT system (without using sentiment classification) in terms of sentiment preservation. We also demonstrate that using an MT engine that conveys a sentiment different from that of the UGC can even worsen both the translation quality and sentiment preservation.
منابع مشابه
The Impact of Sentiment Analysis Output on Decision Outcomes: An Empirical Evaluation
User-generated online content serves as a source of productand service-related information that reduces the uncertainty in consumer decision making, yet the abundance of such content makes it prohibitively costly to use all relevant information. Dealing with this (big data) problem requires a consumer to decide what subset of information to focus on. Peer-generated star ratings are excellent to...
متن کاملA Fuzzy Computing Model for Identifying Polarity of Chinese Sentiment Words
With the spurt of online user-generated contents on web, sentiment analysis has become a very active research issue in data mining and natural language processing. As the most important indicator of sentiment, sentiment words which convey positive and negative polarity are quite instrumental for sentiment analysis. However, most of the existing methods for identifying polarity of sentiment word...
متن کاملSparse unsupervised feature learning for sentiment classification of short documents
The rapid growth of Web information led to an increasing amount of user-generated content, such as customer reviews of products, forum posts and blogs. In this paper we face the task of assigning a sentiment polarity to user-generated short documents to determine whether each of them communicates a positive or negative judgment about a subject. The method we propose exploits a Growing Hierarchi...
متن کاملLT3: Sentiment Classification in User-Generated Content Using a Rich Feature Set
This paper describes our contribution to the SemEval-2014 Task 9 on sentiment analysis in Twitter. We participated in both strands of the task, viz. classification at message-level (subtask B), and polarity disambiguation of particular text spans within a message (subtask A). Our experiments with a variety of lexical and syntactic features show that our systems benefit from rich feature sets fo...
متن کاملSentiment Analysis of News Headlines using Naïve Bayes Classifier
The amount of user generated content is increasing day by day and it involves detection of opinions about particular topic or an object. Sentiment analysis is used to extract sentiments of people about products, moviews, political events etc. It identifies the viewpoint of opinion holder and polarity of the content i.e. positive negative or neutral. Given the large amount of news being generate...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017